were ask 
for AVVays { 
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String theory 
C strings are primitive data struc- 
tures only one step removed from 
Fortran’s horrible habit of storing 
characters in numeric variables 
(OK, Fortran-lovers, I know Fortran doesn’t do it 
that way any more!) They tend to confuse pro- 
grammers transferring from languages such as 
Basic, where strings are a complex, subtle, but 
easy-to-use data type. In C’s defence, I should 
point out that there’s a good reason for their 
recalcitrance: C strings have the important ad- 
vantage of being fast for the processor — if not for 
the programmer — to deal with. 
AC string is simply an array of characters. 
For example: 


char name[9].! 


defines an array of 10 character variables 
(remember that C arrays start from 0). A charac- 
ter constantis written using single quotes, so you 
get expressions like name[0]='S'. You can work 
with character arrays in exactly the same way 
you would work with any other array. Atthis level 
C provides very much the same character han- 
dling facilities as Pascal. 

Consider for a moment writing a program to 
read in a name and print it out. Using the fixed- 
length string facility as described above, this 
would take a lot of code. You'd have to read in 
each character in turn, store it in the array at the 
correct location, and when it came to printing it 
out you would either need to know the length of 
the valid text or print the entire array. Clearly, 
there has to be a better way. 
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To avoid such awkward little problems, C 
allows you to use string constants. A string con- 
stant is written between double quotes, and cor- 
responds to the composite data type character 
array. In addition, every string constant has an 
implied zero character which marks the end of 
the string. You can use a string constant to ini- 
tialise a character array in a fairly natural way, 
such as: 


char name[ ]="Sam". 


which creates a four-element character array 
with name[0] to name[2], containing the letters 
of the string, and name[3], containing a zero. 

After this one little concession to ease of use, 
that’s your lot. Don’t try: 


name[]="Sam". 


or anything like it in your program, because C 
doesn’t understand how to manipulate string 
constants over and above initialisation. 


No scan do 

You may think I’m being a bit unfair to C here, 
because the standard C print command printf 
and the standard input command scanf both have 
string handling facilities. That is, you can use the 
format specifier %s to read and write strings. For 
example: 


char name[10);. 

printf("What is your name?");. 
scanf("%s" name); 
printf("\n Hello %s",name);. 


The scanf reads in the characters that the user 
types and stores them in name with null as a ter- 
minator. The printf prints each character in name 
until it reaches the terminator. 

Whatyou have to remember here is that both 
scanf and printf are standard functions supplied 
as part of the standard I/O library, and are not 


really part of the C language itself. Indeed, allthe » 
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other string handling facilities of C are provided 
by library functions. 

For example, scanf isn’t a very good string 
input function, because it stops at the first white 
space character. It’s better to think of scanf as a 
‘read a single word’ facility. If you want to read in 
astring that contains blanks then you need to use 
gets(chararray), which will read characters into 
the character array until it hits a newline charac- 
ter, when it terminates the string with a null. 
There’s a corresponding puts() function, but this 
isn’t quite as useful because its only advantage 
over printf is that it automatically adds a newline 
at the end of the output. 

You need to keep in mind that C string 
handling really amounts to little more than a con- 
vention to use character arrays in a particular 
way and a set of library functions to work with 
them. For example, how do you copy a string 
from one array to another? If you like cryptic C, 
try the following: 


for(i=0;copyli] =namel[i);i+ +); 


which is one of the many fancy ways of copying a 
string from one array to another. The reason why 
the for loop stops is that the assignment returns 
the assigned value as its result, so the loop stops 
on the null. If you use explicit pointers to the 
character arrays, you can even manage to copy a 
string without using the index variable. 

Interesting though all of this is, there’s a lot 
to be said for sticking with the supplied string 
functions. 


Expanding the repertoire 
So what few string functions should every good 
C programmer knowin addition to gets and puts? 


¢ strlen(string) 

Returns the length of the string. Notice that this 
isn’t the same as the size of the array used to hold 
the string, butis the number of characters before 
the null. 


¢ strcat(string1 ,string2) 
Adds the contents of string2 to the end of string1. 


e strcmp(string1,string2) 

Compares the two strings and returns a value of 
0 if they're the same, less than 0 if string] is less 
than string2, and more than 0 otherwise. 


¢ strcopy(string1,string2) 
Transfers the contents of string2 into string1. 


These are all the string functions that you really 
need, but there are lots of others, because you 
can do anything you want to as long as you 
remember that you can convert array references 
to addresses. For example: 


strcopy(&copy[5],newstr)! 


will overwrite what’s stored in copy[5] with the 
contents of newstr, with the effect of changing 
the end of the string. 

Similarly, to delete a portion of a string, say, 
from m to n you could use: 


strcopy(&string[m],&stringin]) 
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If you want to do something a little more sophis- 
ticated, you can always use the character array 
directly — but remember to keep track of the null 
which terminates the string. The second com- 
monest error in working with C strings is either 
to lose the terminating null or to acquire one in 
the wrong place. 

Ofcourse, the most common error in using C 
strings is not to reserve enough space for them. 
It’s all to easy to forget that you’re not working 
with a dynamic storage allocation system but a 
fixed-length facility. Still, you could always have 
a go at programming your own dynamic string 
handler... 


In theory 


Real-world events 
Since Windows arrived, event-driven 
programming has been flavour of 
the moment. I’ve written aboutit sev- 
eral times in this column, but there’s 
one aspect of event handling that I haven’traised: 
handling real-world events. 

The traditional theory of real-world events 
seems to centre on the problem caused by syn- 
chronisation and ‘deadlock’. Deadlock is what 
happens when process A claims resource 1 and 
process B claims resource 2, and then to contin- 
ue process A needs to claim resource 2 while B 
needs 1. Both processes hang, waiting for the 
other to complete. Deadlock was once an impor- 
tant theoretical problem, but in these days of 
interactive computing there’s usually a human 
around to notice the problem and kill one of the 
processes so that the other can complete. If you 
don’t think deadlock occurs very often, you obvi- 
ously don’t use a network! 

Deadlock may be a fun idea, and even occa- 
sionally important, but I think we’ve missed an 
entire area of difficulty that’s of more vital signif- 
icance. Let me give you an example. Back in the 
days when I used to teach operating system the- 
ory (cue Hovis theme music and yellowed film of 
small boy wheeling his pushbike home with a 
fresh Unix implementation in the basket), a stan- 
dard assignment was to ask the students to work 
out how to implementa time-sharing system that 
booked aeroplane seats. 

This is a textbook problem, and nearly all of 
them would find the correct answer based on 
using some sort of record locking. This stops two 
order entry operators trying to reserve the same 
seat at the same time. The sequence of events 
goes: operators A and B get a phone call at the 
same time; they both enquire about the availabil- 
ity of seat 1 and are told it’s free (after all, it is!); 
so they both book it for a client. The correct 
method for avoiding this is to lock the record that 
records the state of seat 1 while there’s an 
enquiry in progress, so that another operator can 
neither gain information about it nor book it 
while the first transaction is in progress. 


Out of lock 

You can manipulate this exercise to illustrate the 
potential for deadlock in any situation using lock- 
ing. For example, suppose operator A books seat 
1 and operator B books seat 2, and then both 
clients ask to extend their booking to two adja- 


cent seats in the same row (there being three 
seats to a row in our example aircraft). This is a 
sort of weak deadlock, because neither can claim 
the adjacent seat while the other has booked it. 
In this case it’s easier to leave the solution up to 
the operators. However, there’s a sub-problem 
here that’s very rarely discussed and is much 
more interesting. 

Let’s say the operators have booked all the 
seats on the plane by telephone, and the system 
has sent out invoices to all the passengers. 
Potential new bookings are being told that the 
flight is full. Then, one week before the plane is 
scheduled to leave, it’s noticed that none of the 
passengers have paid for their seats. What’s the 
status of the seats? Should they be resold? 
Should the potential passengers be contacted 
again? Is there enough time? Will the plane take 
off empty, or will it take off with a different set of 
people while the original set cause a riot in the 
airport because their seats were resold? 

This part of the question generally caused 
my students much more trouble. Some would try 
to argue that this wasn’t a programming or even 
a computer issue at all, but something to do with 
the way a company is run. I have a certain sym- 
pathy with this point of view, but as time goes on 
computer systems are expected to model the 
way that businesses work more and more accu- 
rately. 

The real difficulty with the seat booking 
problem is that the two events — booking a seat 
inside the software and booking a seat outside 
the software — weren’t synchronised. The pro- 
gram regarded a seat a booked from the moment 
that the customer placed the order, but in reality 
the exact moment when a seat is booked is sur- 
prising difficult to pin down. The logical thing is 
to try to pin it down, and most people would add 
the condition that a seat is not booked until it’s 
paid for. 

Problem solved? Let’s see. Repeat the entire 
exercise, but this time allow the customers to pay 
by credit card. Let the program mark offthe seats 
as booked, and then, one hour before the plane 
takes off, have the credit card company void all 
the claims for a range of data entry errors. Now 
are the seats booked? 


Stock answers 

This difficulty in synchronising simple events in 
a program to the rather vaguer real-world events 
is a problem that recurs time and time again, and 
yet most programmers don’t recognise all the 
instances ofit as examples ofthe same thing. You 
try to buy some item in a showroom; you're told 
that it’s in stock; you go to the warehouse to pick 
the item up, only to find that it’s out of stock. The 
reason is that the act of removing items from 
shelves for reasons other than a sale (eg dam- 
aged stock) isn’t precisely synchronised with 
decrementing the stock count, and the number 
of items actually on the shelf is itself not syn- 
chronised with the decrementing of the stock 
count. 

Whenever a database is changed, there’s a 
period of time when the real world may not be in 
agreement with it, and vice versa. Sometimes 
this time period is short and nothing much can 
happen to upset our data models of the world, but 
on other occasions there’s a worryingly long and » 


indeterminate period of time when reality and a 
data model go their separate ways. You should at 
least give some thought to how program and 
real-world events tie up and how they can be kept 
in step. 


Dumb PostScript printing 

I seem to have said the wrong thing 

as part of my Print Talk series. Imen- 

tioned in passing that it was easy to 
write PostScript code to emulate a dumb charac- 
ter printer, and this must have struck a chord 
with readers — quite a few wrote, faxed and 
phoned to ask how it could be done. I have to 
admit that it was intended as a throw-away line 
and I didn’t expect to actually have to write an 
emulator - so perhaps I regret calling it easy... 

To emulate another printer, all you have to do 

is write a PostScript program that reads in data 
from the data stream and prints it using a suitable 
font. The simplest implementation of such a pro- 
gram is as follows: 


/buffer 128 string def. 
/dumb {%def. 
fer sas) 
currentfile buffer readstring. 
exch show not {exit} if. 
}loop. 
showpage. 
}bind def. 
ea 
10,10 moveto_ 
/Courier findfont 10 scalefont setfont. 
dumb. 


The first part of the program defines a subroutine 
that reads data from the current file into the 
buffer using the readstring command. This 
leaves a condition code on the top of the stack. 
The exch command brings the data to the top of 
the stack, and the show command ‘prints’ it. The 
if command then tests the condition now on the 
top of the stack, and finishes the loop when 
there’s no more data to print. A showpage finish- 
es offthe subroutine. The final three lines set the 
starting position on the page, the font to use and 
the emulator. 

If you find this program difficult to read, 
recall that PostScript is a stack-oriented lan- 
guage and the operators come after the 
operands. For example, the if command is: 


boolean {procedure} if. 


In other words, it helps to read each line back- 
wards! 


Looping the loop 
To use this emulator, you need to store the 
PostScript commands in an Ascii file and send 
this file to the printer before trying to print any- 
thing that needs a dumb printer. It then acts as 
the PostScript header for the dumb printer com- 
mands. 

There’s one extra complication: the loop will 
continue forever unless it receives an EOF (end 
of file) character. Now, before you start looking 


up how to PostScript printers, 
enter Ctrl-Z, I’d better such as Texas 
point out that PostScript ’ Instruments’ 


actually uses Ctrl-D as 
an EOF indicator. The 
reason for this is that 
Ctrl-D is EOF for a serial 
link, and PostScript prin- 
ters were originally nearly always driven via 
serial links. 

So as well as the emulator file to act as a 
PostScript header, you need an Ascii file con- 
taining the single code Ctrl-D to act as a Post- 
Script postscript! Ifyou’re using MSDos5, all you 
have to do is use Edit; to enter Ctrl-D, just type 
Ctrl-P followed by Ctrl-D, which should show as 
adiamond. To enter Ctrl-D using Edlin, type Ctrl- 
V and then press D. 

This is a very dumb printer emulator, as it 
doesn’t even take notice of carriage returns, let 
alone any other control codes. We can improve it 
by changing the first part to: 


microLaser PS17, are 
very smart — so how do 
you make them act 
dumb? 


/dumb {%def. 
{%loop. 
currentfile buffer readstring exch. 
EOLchar. 
{%loop- 
search 
{ show nextline}_! 
{ show exit }ifelse 
} loop 
not {exit} if 
}loop 
showpage.! 


This looks complicated, but all it adds is an inner 
loop that scans the input for the end-of-line char- 
acter and calls the nextline subroutine if it finds 
one. The nextline subroutine simply moves the 
current position down by an amount equal to the 
height of the font. It also has to look out for reach- 
ing the end ofthe current page and starting anew 
one as necessary. 

The search command is an innocent-looking 
thing, but it does rather more than you might 
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imagine. It looks in the string on the top of the 
stack for the string just below it on the 
stack. Ifit finds the target string, it splits 
the searched string into three parts 
— the bit after the match, the 
match, and the bit before the 
match —and pushes them 
onto the stack in that 
order followed by a 
Boolean True. If it 
doesn’t find the target, it 
just pushes a Boolean 
False onto the stack. 
Armed with this in- 
formation, you should 
now be able to see how the 
inner loop ‘peels off sec- 
tions of the original string 
until it’s all used up. 


Whole in one 
All that remains is to write some 
of the code that surrounds the key sub- 
routine which does all the work. Rather than pre- 
sent it piecemeal, here’s the complete PostScript 
dumb printer emulator: 


/buffer 5000 string deft 
/leftmargin 72 def. 
/topmargin 72 deft 
/bottom 72 def. 
/top 792 topmargin sub def. 
/psize 10 def 
/line 10 def. 
/EOLchar (\n) def. 
pe 
/nextlinet 
currentpoint exch pop lead sub 
dup bottom It { 
showpage._ 
pop top. 
}if 
leftmargin exch moveto. 
}bind deft 
Pa 
/dumb {%def. 
{%loop.t 
currentfile buffer readstring exch 
not {exit} if. 
}loopi 
showpage.t 
leftmargin top moveto. 
}bind def 
a 
/Courier findfont. 
ptsize scalefont setfont! 
leftmargin top moveto. 
dumb. 


show 1 


Enter this lot as an Ascii text file and save it with 
the file name Dumb.txt. You'll still need the file 
with the single Ascii EOF (ie Ctrl-D) character 
discussed above, called, let’s say, Eof.txt: 

Now when you want to send raw Ascii text 
to the PostScript printer, all you have to do is 
send Dumb first, then the text, and finally the 
EOF. For example, if your Ascii text is in a file 
called Mytext.txt, you can achieve all of this 
using: 


copy dumb.txt+ mytext.txt+ eof.txt prni 
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Ifyou want to send ‘live’ text—say, from adircom- 
mand - to the printer, you have to use something 
like: 


copy dumb.txt prn 
dir >prni 
copy eof.txt prn 


You can see the idea — first install the emulator 
program and start it running, then send as much 
text as you like, and finally stop the emulator run- 
ning with an EOF. Note that the emulator will be 
stored in the printer until someone resets it or 
switches it off. 


Programmer’s Challenge 


j=] Progress report 
If you’ve been wondering what hap- 
©’) pened to the last Programmer’s 
Challenge, Number Crunch, it may 
be a relief to know that your entry hasn’t van- 
ished into a black hole, and no, you haven’t 
missed the results. The simple truth is that the 
number of entries was so large, as indeed were 
most of the entries themselves, that I’m still 
working my way though them. Given that each 
contestant seems to have invested many hours — 
or more likely days, and in some cases weeks — 
in his or her solution, it wouldn’t be right to just 
skim through the entries, so it does take a lot of 
time. Many apologies, but that’s what you get if 
you will insist on entering in such numbers! 
The good news is that it looks as though the 
analysis will be completed in time for next 
month’s deadline, so watch this space. 


Rabin’s algorithm 

You may think of prime numbers as 
nothing more than a mathematical 
curiosity, butifyou’ve been keeping 
up to date on your cryptography you'll know 
they’re the key (literally!) to the newer public 
coding systems. Finding primes and proving that 
a number is a prime are now big business. 

At first sight the problem looks trivial. A 
prime is a number that has no factors apart from 
1 and itself. So 2 is prime and so is 3; 4 isn’t 
because it’s 2x2; 5 is, but 6 is 2x3; and so on. So 
how can you tell ifa number is a prime? The sim- 
plest method is to try to divide it: 


23 

DOW 

IF number/i=number\i THEN EXIT LOOP. 
i=i+1i 

LOOP UNTIL i> =number/2 


In this example, / and \ are real and integer divi- 
sion respectively, and if n1 is divisible by n2 the 
two methods produce the same result because 
the fractional part of the real division will be zero. 
If you don’t like using real and integer division 
use (number MOD i), which is the remainder on 
dividing number by i. 

This algorithm looks innocent enough, but 
for cryptography purposes we need to work with 
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very large primes — 100 dig- 
its is a small number in this 
game - and dividing by 
every possible smaller num- 
ber just takes too long: a 
time equal to the age of the 
universe, assuming you’ve 
got a fast computer! 

Once you start thinking 
about the prime number 
problem, it’s very difficult 
not to find short cuts. For 
example, you can cut down 
on the number of divisions 
by not bothering with all the 
even numbers — it’s obvious 
that, after 2, only odd num- 
bers can be prime. This 
halves our work, but what 
about ruling out multiples of 
3, and then multiples of 5, 
and so on? 

This may sound like a 
great method, but as it turns 
out the task of keeping track 
of all those multiples of smaller primes is diffi- 
cult, and it doesn’t help much with the really big 
numbers. 

In fact, this method is the basis of the sieve of 
Eratosthenes, which is commonly used as a test 
for compiler quality. The method first fills an 
array with values from 2 to N. It then starts with 
the second value, and strikes out of the array all 
values stored at 2*7. Then it scans along the array 
to find the first non-struck-out value, which is of 
course 3, and goes and strikes out all values 
stored at 3*7 — and so on. Each time it finds the 
next value it’s found a prime, and strikes out all 
multiples of it in the array. When the algorithm 
has finished you have all the primes between 1 
and N stored in the array. Great! Except that this 
method is still far too slow and we couldn’t afford 
the storage anyway. 


Surely not 

As far as we know there’s no method that will 
prove anumber to be prime ina reasonable time, 
but there is an algorithm that will prove that a 
number is ot prime, to any degree of certainty 
you care to specify, in a reasonable time. You 
might try the following method: 


e Pick x to be number in the range 1 to n/2 
¢ Work out x mod » 
¢ Ifx mod 7 is 0 then 7 is not a prime 


Ifyou repeat this for enough values ofx and don’t 
get a 0 value for the remainder, you have good 
evidence for believing that x is prime — but how 
much evidence? The problem is that you don’t 
know what proportion of values between 1 and 
n/2 are capable of proving to be non-primal, so 
you don’t know the probability of finding a such 
avalue ofx. This is where Rabin’s theorem comes 
in, and it goes like this. 

Awitness to the non-primality of is a num- 
ber w satisfying: 


A. w(n-1) mod n =1 
B. For some integer k, 1< gced(w(n-1)/ 
(2%R),n) <n 


where gcd(x,y) is the greatest common divisor 
function, ie the largest number that divides both 
x and y exactly. 

Now ifyou can find a single witness for 7, it’s 
definitely not a prime because it has a greatest 
common divisor with another number that’s 
greater than 1 and less than 7 — in other words, it 
has a factor. (Notice that a witness isn’t itself nec- 
essarily a factor of 7.) 

At this point the idea may sound ridiculous, 
but Rabin managed to prove that for any non- 
prime ” more than 50 percent of the set of num- 
bers (2, 3, 4... n-1) are witness. Now we have a 
method that works. As the probability of picking 
a witness from the set is 1/2, the probability of 
picking m numbers from the set without finding 
a witness is 1/(24m). So all we have to do is 
choose a level of certainty as 1-1/(2m), pick m 
random numbers from the set (2, 3... m-1), and 
set them to see if any is a witness. Ifno witnesses 
are found, the probability that ” is a prime is 1- 
1/(24m). 

To see just how effective this method is, look 
how quickly the probability of m being a prime 
rises even for small values of m: 


Wee"? m P 

TELS. 6 984375 
peas 7 9921875 

pias Ti, 8 99609375 

4 9375 9 998046875 
5.96875 10 9990234375 


Putting this another way, if is a prime you have 
a better than .999 probability of finding a witness 
in only 10 goes. If you try 20 values, incidentally, 
the probability rises to .999999046326. 

This method of testing primes is interesting, 
not only because it gives us a relatively very fast 
practical test to any desired degree of confidence 
but because it introduces a general method. If 
you want to test for anything and it takes too long, 
find a witness. Bear in mind, however, that this is 
only of any use if you can prove how many wit- 
nesses you can expect to find by randomly sam- 
pling a given set. ™ 


